首页> 外文OA文献 >Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

【2h】

Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

机译：发现是虚假的吗？最大伪相关的分布及其应用

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the last two decades, many exciting variable selection methods have beendeveloped for finding a small group of covariates that are associated with theresponse from a large pool. Can the discoveries from these data miningapproaches be spurious due to high dimensionality and limited sample size? Canour fundamental assumptions about the exogeneity of the covariates needed forsuch variable selection be validated with the data? To answer these questions,we need to derive the distributions of the maximum spurious correlations givena certain number of predictors, namely, the distribution of the correlation ofa response variable $Y$ with the best $s$ linear combinations of $p$ covariates$\mathbf{X}$, even when $\mathbf{X}$ and $Y$ are independent. When thecovariance matrix of $\mathbf{X}$ possesses the restricted eigenvalue property,we derive such distributions for both a finite $s$ and a diverging $s$, usingGaussian approximation and empirical process techniques. However, such adistribution depends on the unknown covariance matrix of $\mathbf{X}$. Hence,we use the multiplier bootstrap procedure to approximate the unknowndistributions and establish the consistency of such a simple bootstrapapproach. The results are further extended to the situation where the residualsare from regularized fits. Our approach is then used to construct the upperconfidence limit for the maximum spurious correlation and to test theexogeneity of the covariates. The former provides a baseline for guardingagainst false discoveries and the latter tests whether our fundamentalassumptions for high-dimensional model selection are statistically valid. Ourtechniques and results are illustrated with both numerical examples and realdata analysis.

机译：在过去的二十年中，已经开发了许多令人兴奋的变量选择方法，用于从大池中查找与响应相关的一小组协变量。这些数据挖掘方法的发现是否因高维和有限的样本量而虚假？关于这种变量选择所需协变量外生性的基本假设是否可以用数据验证？要回答这些问题，我们需要在给定一定数量的预测变量的情况下，得出最大虚假相关性的分布，即响应变量$ Y $与最佳$ s $线性组合$ p $协变量$ \的相关分布。 mathbf {X} $，即使$ \ mathbf {X} $和$ Y $是独立的。当$ \ mathbf {X} $的协方差矩阵具有受限制的特征值属性时，我们使用高斯逼近和经验过程技术得出有限的$ s $和发散的$ s $的分布。但是，这样的分布取决于未知的协方差矩阵$ \ mathbf {X} $。因此，我们使用乘数引导程序来近似未知分布，并建立这种简单引导方法的一致性。结果进一步扩展到残差来自正则拟合的情况。然后，我们的方法用于构造最大伪相关的置信上限，并测试协变量的外生性。前者为防止错误发现提供了基准，后者则测试了我们对高维模型选择的基本假设在统计上是否有效。我们的技术和结果通过数值示例和实数据分析进行了说明。

著录项

作者
Fan, Jianqing; Shao, Qi-Man; Zhou, Wen-Xin;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS [J] . Fan Jianqing, Shao Qi-Man, Zhou Wen-Xin The Annals of Statistics: An Official Journal of the Institute of Mathematical Statistics . 2018,第3期

机译：发现是虚假的吗？最大杂散相关性及其应用的分布
2. A drunk and her dog: a spurious relation? Cointegration tests as instruments to detect spurious correlations between integrated time series [J] . Esther Stroe-Kunold, Joachim Werner Quality and Quantity . 2009,第6期

机译：醉汉和狗：虚假的关系？协整测试作为检测积分时间序列之间的虚假相关性的工具
3. Spurious Latent Class Problem in the Mixed Rasch Model: A Comparison of Three Maximum Likelihood Estimation Methods under Different Ability Distributions [J] . Sedat Sen International Journal of Testing: Official Journal of the International Test Commission . 2018,第1期

机译：混合Rasch模型中的虚假潜在问题：不同能力分布下三种最大似然估计方法的比较
4. Increasing Robustness to Spurious Correlations using Forgettable Examples [C] . Yadollah Yaghoobzadeh, Soroush Mehri, Remi Tachet des Combes, Conference of the European Chapter of the Association for Computational Linguistics . 2021

机译：使用遗忘例子提高对杂散相关性的鲁棒性
5. A Spurious-Free Switching Buck Converter for Portable Applications. [D] . Alghamdi, Mohammad Khalaf. 2012

机译：适用于便携式应用的无杂散开关降压转换器。
6. ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUSCORRELATIONS AND THEIR APPLICATIONS [O] . Jianqing Fan, Qi-Man Shao, Wen-Xin Zhou -1

机译：发现是偶然的吗？最大伪散布的分布相关性及其应用
7. Power-law distributions based on exponential distributions: Latent scaling, spurious Zipf's law, and fractal rabbits [O] . Chen, Yanguang 2013

机译：基于指数分布的幂律分布：潜在的缩放，假Zipf定律和分形兔
8. Application of Maximum Entropy Analysis to ISAR Imagery and Spurious Scatterer Location in Anechoic Chambers. [R] . Borden, B. 1989

机译：最大熵分析在消音室内IsaR图像和杂散散射器定位中的应用。

Are Discoveries Spurious? Distributions of Maximum Spurious Correlations and Their Applications

摘要

著录项

相似文献

相关主题

期刊订阅